numpy array
To Believe or Not to Believe Y our LLM: Iterative Prompting for Estimating Epistemic Uncertainty
We explore uncertainty quantification in large language models (LLMs), with the goal to identify when uncertainty in responses given a query is large. We simultaneously consider both epistemic and aleatoric uncertainties, where the former comes from the lack of knowledge about the ground truth (such as about facts or the language), and the latter comes from irreducible randomness (such as multiple possible answers).
- North America > United States (0.14)
- North America > Canada > Alberta (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- (5 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL
Chen, Siyi, Uy, Mikaela Angelina, Song, Chan Hee, Ladhak, Faisal, Murali, Adithyavairavan, Qu, Qing, Birchfield, Stan, Blukis, Valts, Tremblay, Jonathan
Vision Language Models (VLMs) demonstrate strong qualitative visual understanding, but struggle with metrically precise spatial reasoning required for embodied applications. The agentic paradigm promises that VLMs can use a wide variety of tools that could augment these capabilities, such as depth estimators, segmentation models, and pose estimators. Yet it remains an open challenge how to realize this vision without solely relying on handcrafted prompting strategies or enforcing fixed, predefined tool pipelines that limit VLMs' ability to discover optimal tool-use patterns. Reinforcement Learning could overcome this gap, but has so far been limited to reasoning with a single visual tool due to the large search space in multi-tool reasoning. We introduce Double Interactive Reinforcement Learning (DIRL), a two-phase training framework where VLMs learn to coordinate multiple tools through interactive exploration and feedback. In the teaching phase, we combine demonstrations from a single tool specialist trained via interactive RL with traces from a frontier model using all tools. In the exploration phase, the model further refines multi-tool coordination through continued RL. Our model, SpaceTools, with tool-augmented spatial reasoning ability, achieves state-of-the-art performance on spatial understanding benchmarks (RoboSpatial-Home, BLINK, BOP-ASK) and demonstrates reliable real-world manipulation using a 7-DOF robot as a tool. DIRL provides substantial improvements over the vanilla SFT (+12% on RoboSpatial) and RL (+16% on RoboSpatial) baselines. Project page: https://spacetools.github.io/.
- North America > United States > Ohio (0.04)
- North America > United States > Michigan (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (2 more...)
- Education (0.68)
- Leisure & Entertainment (0.67)
- North America > United States (0.14)
- North America > Canada > Alberta (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- (5 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)
To Believe or Not to Believe Your LLM
Yadkori, Yasin Abbasi, Kuzborskij, Ilja, György, András, Szepesvári, Csaba
We explore uncertainty quantification in large language models (LLMs), with the goal to identify when uncertainty in responses given a query is large. We simultaneously consider both epistemic and aleatoric uncertainties, where the former comes from the lack of knowledge about the ground truth (such as about facts or the language), and the latter comes from irreducible randomness (such as multiple possible answers). In particular, we derive an information-theoretic metric that allows to reliably detect when only epistemic uncertainty is large, in which case the output of the model is unreliable. This condition can be computed based solely on the output of the model obtained simply by some special iterative prompting based on the previous responses. Such quantification, for instance, allows to detect hallucinations (cases when epistemic uncertainty is high) in both single- and multi-answer responses. This is in contrast to many standard uncertainty quantification strategies (such as thresholding the log-likelihood of a response) where hallucinations in the multi-answer case cannot be detected. We conduct a series of experiments which demonstrate the advantage of our formulation. Further, our investigations shed some light on how the probabilities assigned to a given output by an LLM can be amplified by iterative prompting, which might be of independent interest.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States (0.14)
- Europe > Ireland (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge
Tang, Xiangru, Qian, Bill, Gao, Rick, Chen, Jiakang, Chen, Xinyun, Gerstein, Mark
Pre-trained large language models have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular domains. Here, we target bioinformatics due to the amount of specialized domain knowledge, algorithms, and data operations this discipline requires. We present BioCoder, a benchmark developed to evaluate large language models (LLMs) in generating bioinformatics-specific code. BioCoder spans a broad spectrum of the field and covers cross-file dependencies, class declarations, and global variables. It incorporates 1026 Python functions and 1243 Java methods extracted from GitHub, along with 253 examples from the Rosalind Project, all pertaining to bioinformatics. Using topic modeling we show that overall coverage of the included code is representative of the full spectrum of bioinformatics calculations. BioCoder incorporates a fuzz-testing framework for evaluation. We have applied it to evaluate many models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, and GPT-4. Furthermore, we finetuned StarCoder, demonstrating how our dataset can effectively enhance the performance of LLMs on our benchmark (by >15% in terms of Pass@K in certain prompt configurations and always >3%). The results highlight two key aspects of successful models: (1) Successful models accommodate a long prompt (> ~2600 tokens) with full context, for functional dependencies. (2) They contain specific domain knowledge of bioinformatics, beyond just general coding knowledge. This is evident from the performance gain of GPT-3.5/4 compared to the smaller models on the benchmark (50% vs up to ~25%). Our dataset, benchmark, Docker images, and scripts required for testing are all available at https://github.com/gersteinlab/biocoder.
- Asia > Middle East > Jordan (0.04)
- Asia > Vietnam > Long An Province (0.04)
- Overview (0.92)
- Research Report > New Finding (0.45)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Genetic Disease (0.67)
- Health & Medicine > Therapeutic Area > Oncology (0.45)
- Health & Medicine > Therapeutic Area > Hematology (0.45)
Artificial Intelligence for EEG Prediction: Applied Chaos Theory
In the present research, we delve into the intricate realm of electroencephalogram (EEG) data analysis, focusing on sequence-to-sequence prediction of data across 32 EEG channels. The study harmoniously fuses the principles of applied chaos theory and dynamical systems theory to engender a novel feature set, enriching the representational capacity of our deep learning model. The endeavour's cornerstone is a transformer-based sequence-to-sequence architecture, calibrated meticulously to capture the non-linear and high-dimensional temporal dependencies inherent in EEG sequences. Through judicious architecture design, parameter initialisation strategies, and optimisation techniques, we have navigated the intricate balance between computational expediency and predictive performance. Our model stands as a vanguard in EEG data sequence prediction, demonstrating remarkable generalisability and robustness. The findings not only extend our understanding of EEG data dynamics but also unveil a potent analytical framework that can be adapted to diverse temporal sequence prediction tasks in neuroscience and beyond.
- North America > United States > New York (0.04)
- North America > Mexico > Gulf of Mexico (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Research Report > New Finding (0.92)
- Research Report > Experimental Study (0.67)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
Python Tutorial: Image processing with Python (Using OpenCV)
In this tutorial, you will learn how you can process images in Python using the OpenCV library. OpenCV is a free open source library used in real-time image processing. It's used to process images, videos, and even live streams, but in this tutorial, we will process images only as a first step. Before getting started, let's install OpenCV. Now OpenCV is installed successfully and we are ready.
Drift Detection Using TorchDrift for Tabular and Time-series Data – Towards AI
Originally published on Towards AI. Machine learning models are designed to make predictions based on data. However, the data in the real world is constantly changing, and this can affect the accuracy of the model. This is known as data drift, and it can lead to incorrect predictions and poor performance. In this blog post, we will discuss how to detect data drift using the Python library TorchDrift.
- Transportation > Ground (0.40)
- Transportation > Passenger (0.34)
TensorFlow 101: Basics for Beginners
Tensorflow is a library that helps engineers build and train deep learning models. It provides all the tools we need to create neural networks. We can use tensorflow to train simple to complex neural networks using large sets of data. Tensorflow is used in a variety of applications, from image and speech recognition to natural language processing and robotics. TensorFlow enables us to quickly and easily build powerful AI models with high accuracy and performance. Tensorflow also works with GPUs and TPUs, which are types of computer chips built to extend TensorFlow's capabilities.
Data Science Prerequisites - Numpy - Pandas- Seaborn
An excellent choice for both beginners and experts looking to expand their knowledge on one of the most popular Python libraries in the world! If you've spent time in a spreadsheet software like MS Excel or Google Sheets and want to take your data analysis skills to the next level, this course is for you! Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python. Pandas is the most powerful and flexible open source data analysis/manipulation tool available in any language.